In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, classification_report, roc_curve, auc
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from ELM import ELM
from tensorflow.keras.utils import to_categorical
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras.callbacks import EarlyStopping
np.random.seed(42)
print("Aktualnie ustawiony seed:", np.random.get_state()[1][0])
Aktualnie ustawiony seed: 42
In [ ]:
data = pd.read_csv('diagnosed_cbc_data_v4.csv')

1. Preliminary analysis¶

In [ ]:
data
Out[ ]:
WBC LYMp NEUTp LYMn NEUTn RBC HGB HCT MCV MCH MCHC PLT PDW PCT Diagnosis
0 10.00 43.200 50.100 4.30000 5.00000 2.77 7.3 24.2000 87.7 26.3 30.1 189.0 12.500000 0.17000 Normocytic hypochromic anemia
1 10.00 42.400 52.300 4.20000 5.30000 2.84 7.3 25.0000 88.2 25.7 20.2 180.0 12.500000 0.16000 Normocytic hypochromic anemia
2 7.20 30.700 60.700 2.20000 4.40000 3.97 9.0 30.5000 77.0 22.6 29.5 148.0 14.300000 0.14000 Iron deficiency anemia
3 6.00 30.200 63.500 1.80000 3.80000 4.22 3.8 32.8000 77.9 23.2 29.8 143.0 11.300000 0.12000 Iron deficiency anemia
4 4.20 39.100 53.700 1.60000 2.30000 3.93 0.4 316.0000 80.6 23.9 29.7 236.0 12.800000 0.22000 Normocytic hypochromic anemia
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1276 4.40 25.845 77.511 1.88076 5.14094 4.86 13.5 46.1526 80.7 27.7 34.4 180.0 14.312512 0.26028 Healthy
1277 5.60 25.845 77.511 1.88076 5.14094 4.85 15.0 46.1526 91.7 31.0 33.8 215.0 14.312512 0.26028 Healthy
1278 9.20 25.845 77.511 1.88076 5.14094 4.47 13.1 46.1526 88.7 29.3 33.0 329.0 14.312512 0.26028 Healthy
1279 6.48 25.845 77.511 1.88076 5.14094 4.75 13.2 46.1526 86.7 27.9 32.1 174.0 14.312512 0.26028 Healthy
1280 8.80 25.845 77.511 1.88076 5.14094 4.95 15.2 46.1526 89.7 30.6 34.2 279.0 14.312512 0.26028 Healthy

1281 rows × 15 columns

The table consists of 1281 rows and 16 columns, describing various blood parameters and medical diagnoses. Here is a description of each column:

  1. WBC - White Blood Cell count.
  2. LYMp - Lymphocyte percentage.
  3. NEUTp - Neutrophil percentage.
  4. LYMn - Lymphocyte count.
  5. NEUTn - Neutrophil count.
  6. RBC - Red Blood Cell count.
  7. HGB - Hemoglobin concentration.
  8. HCT - Hematocrit.
  9. MCV - Mean Corpuscular Volume.
  10. MCH - Mean Corpuscular Hemoglobin.
  11. MCHC - Mean Corpuscular Hemoglobin Concentration.
  12. PLT - Platelet count.
  13. PDW - Platelet Distribution Width.
  14. PCT - Plateletcrit.
  15. Diagnosis - Medical diagnosis.

The Diagnosis column contains values:

  1. Healthy - Indicates that the individual's blood test results are within normal ranges, showing no signs of anemia, infection, or other hematological disorders.

  2. Normocytic hypochromic anemia - Anemia characterized by red blood cells (RBCs) that are of normal size (normocytic) but have less hemoglobin than normal (hypochromic). This can occur in conditions such as chronic disease anemia or early iron deficiency anemia.

  3. Normocytic normochromic anemia - Anemia where the RBCs are normal in size and hemoglobin content but reduced in number. This type is often seen in chronic diseases, acute blood loss, or bone marrow disorders.

  4. Iron deficiency anemia - Anemia caused by a lack of iron, which is necessary for the production of hemoglobin. This results in microcytic (small) and hypochromic (pale) RBCs. Common causes include blood loss, poor diet, or malabsorption.

  5. Thrombocytopenia - A condition characterized by a low platelet count, which can lead to increased bleeding and bruising. Causes include bone marrow disorders, autoimmune diseases, and certain medications.

  6. Other microcytic anemia - Anemia with RBCs that are smaller than normal (microcytic), not specifically classified under iron deficiency. It can be caused by conditions like thalassemia or chronic disease.

  7. Leukemia - A type of cancer that affects blood and bone marrow, leading to an overproduction of abnormal white blood cells. Symptoms can include fatigue, frequent infections, and easy bruising.

  8. Macrocytic anemia - Anemia where the RBCs are larger than normal (macrocytic). It is often caused by deficiencies in vitamin B12 or folate, and it can also be due to alcoholism, liver disease, or certain medications.

  9. Leukemia with thrombocytopenia - A condition where an individual has both leukemia and a low platelet count. This combination can exacerbate the symptoms of both conditions, such as increased bleeding risk and fatigue.

In [ ]:
data['Diagnosis'].value_counts()
Out[ ]:
Diagnosis
Healthy                           336
Normocytic hypochromic anemia     279
Normocytic normochromic anemia    269
Iron deficiency anemia            189
Thrombocytopenia                   73
Other microcytic anemia            59
Leukemia                           47
Macrocytic anemia                  18
Leukemia with thrombocytopenia     11
Name: count, dtype: int64

The obvious problem of this data can be, that some of the types of anemia has a very little amount of occurence. Let's try to look on some visualisations, which can give us some informations.

In [ ]:
for col in data.columns:
    if col != 'Diagnosis':    
        plt.figure(figsize=(10, 6))
        sns.histplot(data=data, x=col, hue='Diagnosis', bins=100)
        plt.title(f'Histogram for column {col} relative to Diagnosis.')
        plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

As we can see, a considerable number of features, such as PCT, MCH, and HCT, are quite difficult to interpret and may be of little use. However, there are features that show greater differences across various diagnoses, such as PLT and WBC. For now, let's simplify our modeling by encoding all 'Healthy' diagnoses as Healthy (1) and all other records as Unhealthy (0), and then look at the same visualization again.

In [ ]:
data.loc[data['Diagnosis'] == 'Healthy', 'Healthy'] = 1
data.loc[data['Diagnosis'] != 'Healthy', 'Healthy'] = 0
In [ ]:
data.head()
Out[ ]:
WBC LYMp NEUTp LYMn NEUTn RBC HGB HCT MCV MCH MCHC PLT PDW PCT Diagnosis Healthy
0 10.0 43.2 50.1 4.3 5.0 2.77 7.3 24.2 87.7 26.3 30.1 189.0 12.5 0.17 Normocytic hypochromic anemia 0.0
1 10.0 42.4 52.3 4.2 5.3 2.84 7.3 25.0 88.2 25.7 20.2 180.0 12.5 0.16 Normocytic hypochromic anemia 0.0
2 7.2 30.7 60.7 2.2 4.4 3.97 9.0 30.5 77.0 22.6 29.5 148.0 14.3 0.14 Iron deficiency anemia 0.0
3 6.0 30.2 63.5 1.8 3.8 4.22 3.8 32.8 77.9 23.2 29.8 143.0 11.3 0.12 Iron deficiency anemia 0.0
4 4.2 39.1 53.7 1.6 2.3 3.93 0.4 316.0 80.6 23.9 29.7 236.0 12.8 0.22 Normocytic hypochromic anemia 0.0
In [ ]:
data['Healthy'].value_counts()
Out[ ]:
Healthy
0.0    945
1.0    336
Name: count, dtype: int64

Given the values displayed in the previous cells, these data are still not ideal for us due to a significant imbalance. There are over three times more individuals with 'Healthy = 0' than 'Healthy = 1'. This imbalance may impact the performance of our model, but we are not concerned about it.

In [ ]:
for col in data.drop(columns='Diagnosis').columns:
    if col != 'Healthy':    
        plt.figure(figsize=(10, 6))
        sns.histplot(data=data, x=col, hue='Healthy', bins=100)
        plt.title(f'Histogram for column {col} relative to Diagnosis.')
        plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

What has happened here is that we have ensured that columns such as:

  • PLT,
  • WBC,
  • HGB

play a leading role in our predictions. Let's see how our observations reflect the visualization of the correlation matrix using a heatmap.

In [ ]:
plt.figure(figsize=(16,8))
sns.heatmap(data.drop(columns='Diagnosis').corr(), annot=True)
Out[ ]:
<AxesSubplot:>
No description has been provided for this image

The high correlation between Healthy and PLT or HGB only reinforces our suspicions. Let's now gradually move on to building our model (or models). Of course, let's start by dividing our data into independent variables and dependent variables. Remember that for now, we will focus on binary classification, so let's not forget to remove the Diagnosis column in the process.

In [ ]:
X,y = data.drop(columns=['Diagnosis','Healthy']), data['Healthy']

Before we proceed to build the model correctly, let's check if we can somehow describe our dependent variables using two features. Here, PCA (Principal Component Analysis) will come to our aid, which will extract the most important information from our data. However, before we even apply PCA, we need to start with some data standardization, as PCA itself is very sensitive to it. Here, we'll use the StandardScaler function, which will transform our data in such a way that they have a mean equal to 0 and a standard deviation equal to 1. Then, we'll use the PCA function setting the parameter n_components = 2, allowing our unsupervised learning algorithm to extract two main feature components from the entire dataset. In both cases, we'll use the fit_transform() function right away because for now, we're operating on the entire dataset, not the divided one. Finally, we'll visualize our data.

In [ ]:
X_scaled = StandardScaler().fit_transform(X)
X_scaled_pca = PCA(n_components=2).fit_transform(X_scaled)

data_plot = pd.DataFrame(data=X_scaled_pca, columns=['PC1', 'PC2'])
data_plot['Healthy'] = y
In [ ]:
plt.figure(figsize=(8, 6))
sns.scatterplot(data=data_plot, x='PC1', y='PC2', hue='Healthy')
plt.title('PCA Chart')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()
No description has been provided for this image

2. Binary classification¶

As you can easily notice, distinguishing our two classes using only two features is practically impossible. We could have expected this by looking at the previous histplots, however, it was good to make sure.

Now we move on to the proper model construction. We start by splitting our dataset into training data and test data.

In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=42, stratify=y)
In [ ]:
X_train.shape, X_test.shape
Out[ ]:
((960, 14), (321, 14))

At the very beginning, we will assess the performance of logistic regression, testing several examples with different values of the regularization parameter.

In [ ]:
pipeline = make_pipeline(
    StandardScaler(),
    PCA(),
    LogisticRegression()
)

param_grid =[
    {
        'pca__n_components': [4, 6, 8, 10, 12, 14],
        'logisticregression__C': [0.001, 0.01, 0.1, 1, 10, 100, 1000], 
    }
]

gridsearch1 = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10).fit(X_train,y_train)
In [ ]:
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=12)),
                ('logisticregression', LogisticRegression(C=100))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=12)),
                ('logisticregression', LogisticRegression(C=100))])
StandardScaler()
PCA(n_components=12)
LogisticRegression(C=100)
In [ ]:
gridsearch1.best_score_
Out[ ]:
0.8979166666666666
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.8691588785046729
In [ ]:
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Out[ ]:
Text(95.72222222222221, 0.5, 'Actuval Value')
No description has been provided for this image

As we can see, logistic regression alone can achieve a fairly high score, reaching up to 87% on the training set. This is already quite satisfactory for us. However, let's try something more complex. We move on to Support Vector Machines (SVM), which in the scikit-learn package are defined as SVR for regression and SVC for classification. To expand our test panel, we won't rely on just one type of kernel. We'll use the RBF kernel, based on the radial basis function, as well as the linear kernel. The grid of parameters created is quite extensive, which may lead to longer waiting times for the final result.

In [ ]:
pipeline1 = make_pipeline(
    StandardScaler(),
    PCA(),
    SVC(probability=True)
)
In [ ]:
param_grid =[
    {
        'pca__n_components': [10, 12, 14],
        "svc__kernel": ['rbf'], 
        'svc__C': [0.01, 0.1, 1, 10, 50, 100], 
        'svc__gamma': [0.001, 0.01, 0.1, 1, 10, 50]
    },
    {
        'pca__n_components': [10, 12, 14],
        'svc__kernel': ['linear'], 
        'svc__C': [0.001, 0.01, 0.1, 1, 10, 50, 100]
    }
]
In [ ]:
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
In [ ]:
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=50, gamma=0.1, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=50, gamma=0.1, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=50, gamma=0.1, probability=True)
In [ ]:
gridsearch1.best_score_
Out[ ]:
0.9541666666666666
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.956386292834891
In [ ]:
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Out[ ]:
Text(95.72222222222221, 0.5, 'Actuval Value')
No description has been provided for this image
In [ ]:
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
No description has been provided for this image

As we can see, right from the start, we obtained a model that has over 95% accuracy on the test set, which may already be quite satisfactory for us. Our model selected the version with 14 components, which is the same number of initial features. However, there's no time to rest on our laurels; let's visualize how this accuracy looks against all parameters. It seems to me that the best idea would be to visualize specifically for n_components = 14, so we'll adopt that tactic.

In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14].dropna() #dropna usuwa nam wiersze gdzie nie ma gammy, czyli te linearne svm
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image

What conclusions can we draw from the above analysis? Certainly, we can see that we can shift significantly to the "right" in terms of the C parameter values. It is in this direction that the accuracy of models increases, so we can expect that with an increase in the C parameter, our model should be more accurate. Let's check our conclusions in practice.

In [ ]:
param_grid =[
    {
        'pca__n_components': [12,13,14],
        "svc__kernel": ['rbf'], 
        'svc__C': [10, 50, 100 , 500, 1000, 5000, 10000], 
        'svc__gamma': [0.00001,0.0001,0.001, 0.01, 0.1, 1, 10]
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=5000, gamma=0.01, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=5000, gamma=0.01, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=5000, gamma=0.01, probability=True)
In [ ]:
gridsearch1.best_score_
Out[ ]:
0.9572916666666668
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.9470404984423676
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
No description has been provided for this image

What results did we get? Theoretically, the model improved, but it improved on the training set, which doesn't entirely satisfy us. We would prefer our model to generalize as well as possible rather than perform optimally on the training set. However, there's no need to lose hope; let's try manipulating the grid search parameters a bit more, now focusing only on n_components = 14.

In [ ]:
param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.arange(1000,11000,1000), 
        'svc__gamma': np.linspace(0.001, 0.1,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=1000, gamma=0.023000000000000003, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=1000, gamma=0.023000000000000003, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=1000, gamma=0.023000000000000003, probability=True)
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.9532710280373832
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(500,1100,10), 
        'svc__gamma': np.linspace(0.01, 0.04,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=766.6666666666667, gamma=0.026666666666666665,
                     probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=766.6666666666667, gamma=0.026666666666666665,
                     probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=766.6666666666667, gamma=0.026666666666666665, probability=True)
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.956386292834891
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(6000,10000,10), 
        'svc__gamma': np.linspace(0.05, 0.2,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=6000.0, gamma=0.11666666666666668, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=6000.0, gamma=0.11666666666666668, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=6000.0, gamma=0.11666666666666668, probability=True)
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.956386292834891
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(3000,13000,15), 
        'svc__gamma': np.linspace(0.05, 0.2,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=5).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=3000.0, gamma=0.15000000000000002, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=3000.0, gamma=0.15000000000000002, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=3000.0, gamma=0.15000000000000002, probability=True)
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.9532710280373832
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

plt.figure(figsize=(20,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(3000,100000000,30), 
        'svc__gamma': np.linspace(0.05, 0.2,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X_train,y_train)
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=3000.0, gamma=0.16666666666666669, probability=True))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc',
                 SVC(C=3000.0, gamma=0.16666666666666669, probability=True))])
StandardScaler()
PCA(n_components=14)
SVC(C=3000.0, gamma=0.16666666666666669, probability=True)
In [ ]:
gridsearch1.score(X_test,y_test)
Out[ ]:
0.956386292834891
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

plt.figure(figsize=(20,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,gridsearch1.predict(X_test)), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Out[ ]:
Text(95.72222222222221, 0.5, 'Actuval Value')
No description has been provided for this image
In [ ]:
y_scores = gridsearch1.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_scores)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
No description has been provided for this image

After trying several different configurations, on various intervals, with different numbers of steps, where did it lead us? Unfortunately, we didn't find a better model than our initial one, but there's no need to be discouraged because the model is still satisfactory, if not very good. An accuracy level of nearly 96% is impressive. Now we can try using the TensorFlow package and create a neural network based on gradient-based methods.

In [ ]:
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(200,activation='relu'))
model.add(layers.Dense(2, activation='softmax'))
model.summary()
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
2024-06-09 14:15:14.526827: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape           ┃       Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 14)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 200)            │         3,000 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 2)              │           402 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 3,402 (13.29 KB)
 Trainable params: 3,402 (13.29 KB)
 Non-trainable params: 0 (0.00 B)
In [ ]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
In [ ]:
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
In [ ]:
scalar = StandardScaler().fit(X_train)
X_train_scaled = scalar.transform(X_train)
X_test_scaled = scalar.transform(X_test)
In [ ]:
history = model.fit(X_train_scaled,y_train, batch_size=1024,epochs=100, validation_data=(X_test_scaled,y_test))
Epoch 1/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 84ms/step - accuracy: 0.6177 - loss: 0.6454 - val_accuracy: 0.6573 - val_loss: 0.6425
Epoch 2/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 37ms/step - accuracy: 0.6323 - loss: 0.6399 - val_accuracy: 0.6885 - val_loss: 0.6362
Epoch 3/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.6646 - loss: 0.6324 - val_accuracy: 0.6947 - val_loss: 0.6288
Epoch 4/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.6958 - loss: 0.6236 - val_accuracy: 0.6916 - val_loss: 0.6207
Epoch 5/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.7188 - loss: 0.6141 - val_accuracy: 0.7103 - val_loss: 0.6121
Epoch 6/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.7260 - loss: 0.6041 - val_accuracy: 0.7227 - val_loss: 0.6033
Epoch 7/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7292 - loss: 0.5939 - val_accuracy: 0.7321 - val_loss: 0.5945
Epoch 8/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 40ms/step - accuracy: 0.7302 - loss: 0.5837 - val_accuracy: 0.7383 - val_loss: 0.5858
Epoch 9/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7333 - loss: 0.5736 - val_accuracy: 0.7383 - val_loss: 0.5772
Epoch 10/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7365 - loss: 0.5637 - val_accuracy: 0.7321 - val_loss: 0.5688
Epoch 11/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7354 - loss: 0.5541 - val_accuracy: 0.7321 - val_loss: 0.5606
Epoch 12/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.7354 - loss: 0.5448 - val_accuracy: 0.7352 - val_loss: 0.5528
Epoch 13/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7354 - loss: 0.5359 - val_accuracy: 0.7352 - val_loss: 0.5453
Epoch 14/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7354 - loss: 0.5274 - val_accuracy: 0.7352 - val_loss: 0.5381
Epoch 15/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.5193 - val_accuracy: 0.7352 - val_loss: 0.5312
Epoch 16/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.5116 - val_accuracy: 0.7352 - val_loss: 0.5247
Epoch 17/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7354 - loss: 0.5042 - val_accuracy: 0.7352 - val_loss: 0.5186
Epoch 18/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7344 - loss: 0.4973 - val_accuracy: 0.7352 - val_loss: 0.5127
Epoch 19/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7344 - loss: 0.4906 - val_accuracy: 0.7352 - val_loss: 0.5071
Epoch 20/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.7344 - loss: 0.4843 - val_accuracy: 0.7352 - val_loss: 0.5018
Epoch 21/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.7344 - loss: 0.4783 - val_accuracy: 0.7352 - val_loss: 0.4967
Epoch 22/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.7344 - loss: 0.4726 - val_accuracy: 0.7352 - val_loss: 0.4919
Epoch 23/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4671 - val_accuracy: 0.7352 - val_loss: 0.4874
Epoch 24/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4619 - val_accuracy: 0.7352 - val_loss: 0.4830
Epoch 25/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.7354 - loss: 0.4570 - val_accuracy: 0.7352 - val_loss: 0.4788
Epoch 26/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7354 - loss: 0.4523 - val_accuracy: 0.7352 - val_loss: 0.4749
Epoch 27/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7354 - loss: 0.4477 - val_accuracy: 0.7352 - val_loss: 0.4711
Epoch 28/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.7354 - loss: 0.4434 - val_accuracy: 0.7352 - val_loss: 0.4674
Epoch 29/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.7354 - loss: 0.4393 - val_accuracy: 0.7352 - val_loss: 0.4640
Epoch 30/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.7354 - loss: 0.4353 - val_accuracy: 0.7352 - val_loss: 0.4607
Epoch 31/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7365 - loss: 0.4315 - val_accuracy: 0.7352 - val_loss: 0.4575
Epoch 32/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7365 - loss: 0.4278 - val_accuracy: 0.7352 - val_loss: 0.4545
Epoch 33/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 56ms/step - accuracy: 0.7365 - loss: 0.4242 - val_accuracy: 0.7383 - val_loss: 0.4515
Epoch 34/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.7365 - loss: 0.4208 - val_accuracy: 0.7445 - val_loss: 0.4487
Epoch 35/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.7385 - loss: 0.4175 - val_accuracy: 0.7508 - val_loss: 0.4459
Epoch 36/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7396 - loss: 0.4143 - val_accuracy: 0.7508 - val_loss: 0.4432
Epoch 37/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 87ms/step - accuracy: 0.7406 - loss: 0.4111 - val_accuracy: 0.7508 - val_loss: 0.4406
Epoch 38/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 98ms/step - accuracy: 0.7427 - loss: 0.4081 - val_accuracy: 0.7539 - val_loss: 0.4380
Epoch 39/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.7552 - loss: 0.4051 - val_accuracy: 0.7570 - val_loss: 0.4356
Epoch 40/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.7563 - loss: 0.4022 - val_accuracy: 0.7664 - val_loss: 0.4332
Epoch 41/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7604 - loss: 0.3994 - val_accuracy: 0.7695 - val_loss: 0.4310
Epoch 42/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.7677 - loss: 0.3967 - val_accuracy: 0.7819 - val_loss: 0.4288
Epoch 43/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.7760 - loss: 0.3940 - val_accuracy: 0.7882 - val_loss: 0.4267
Epoch 44/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.7865 - loss: 0.3914 - val_accuracy: 0.7944 - val_loss: 0.4247
Epoch 45/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 59ms/step - accuracy: 0.7979 - loss: 0.3889 - val_accuracy: 0.8006 - val_loss: 0.4227
Epoch 46/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.8094 - loss: 0.3864 - val_accuracy: 0.8069 - val_loss: 0.4208
Epoch 47/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.8167 - loss: 0.3839 - val_accuracy: 0.8100 - val_loss: 0.4189
Epoch 48/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.8260 - loss: 0.3815 - val_accuracy: 0.8224 - val_loss: 0.4171
Epoch 49/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8323 - loss: 0.3791 - val_accuracy: 0.8255 - val_loss: 0.4154
Epoch 50/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.8448 - loss: 0.3767 - val_accuracy: 0.8318 - val_loss: 0.4137
Epoch 51/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 60ms/step - accuracy: 0.8490 - loss: 0.3744 - val_accuracy: 0.8442 - val_loss: 0.4122
Epoch 52/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.8573 - loss: 0.3721 - val_accuracy: 0.8505 - val_loss: 0.4107
Epoch 53/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.8635 - loss: 0.3698 - val_accuracy: 0.8536 - val_loss: 0.4092
Epoch 54/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 55ms/step - accuracy: 0.8708 - loss: 0.3675 - val_accuracy: 0.8598 - val_loss: 0.4079
Epoch 55/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 66ms/step - accuracy: 0.8781 - loss: 0.3653 - val_accuracy: 0.8692 - val_loss: 0.4065
Epoch 56/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8844 - loss: 0.3630 - val_accuracy: 0.8754 - val_loss: 0.4052
Epoch 57/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8844 - loss: 0.3608 - val_accuracy: 0.8754 - val_loss: 0.4040
Epoch 58/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8875 - loss: 0.3587 - val_accuracy: 0.8816 - val_loss: 0.4028
Epoch 59/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8896 - loss: 0.3565 - val_accuracy: 0.8816 - val_loss: 0.4016
Epoch 60/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.8906 - loss: 0.3543 - val_accuracy: 0.8816 - val_loss: 0.4005
Epoch 61/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.8938 - loss: 0.3522 - val_accuracy: 0.8847 - val_loss: 0.3995
Epoch 62/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.8969 - loss: 0.3501 - val_accuracy: 0.8879 - val_loss: 0.3984
Epoch 63/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8969 - loss: 0.3480 - val_accuracy: 0.8879 - val_loss: 0.3974
Epoch 64/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.8990 - loss: 0.3459 - val_accuracy: 0.8879 - val_loss: 0.3963
Epoch 65/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9031 - loss: 0.3438 - val_accuracy: 0.8879 - val_loss: 0.3953
Epoch 66/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9021 - loss: 0.3418 - val_accuracy: 0.8879 - val_loss: 0.3942
Epoch 67/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 44ms/step - accuracy: 0.9031 - loss: 0.3397 - val_accuracy: 0.8879 - val_loss: 0.3932
Epoch 68/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9031 - loss: 0.3377 - val_accuracy: 0.8879 - val_loss: 0.3921
Epoch 69/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 43ms/step - accuracy: 0.9042 - loss: 0.3357 - val_accuracy: 0.8879 - val_loss: 0.3911
Epoch 70/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.9052 - loss: 0.3337 - val_accuracy: 0.8910 - val_loss: 0.3901
Epoch 71/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.9052 - loss: 0.3317 - val_accuracy: 0.8910 - val_loss: 0.3891
Epoch 72/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 42ms/step - accuracy: 0.9062 - loss: 0.3298 - val_accuracy: 0.8910 - val_loss: 0.3882
Epoch 73/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9062 - loss: 0.3278 - val_accuracy: 0.8910 - val_loss: 0.3872
Epoch 74/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9052 - loss: 0.3259 - val_accuracy: 0.8910 - val_loss: 0.3862
Epoch 75/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 57ms/step - accuracy: 0.9052 - loss: 0.3240 - val_accuracy: 0.8910 - val_loss: 0.3853
Epoch 76/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9062 - loss: 0.3221 - val_accuracy: 0.8879 - val_loss: 0.3843
Epoch 77/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9073 - loss: 0.3202 - val_accuracy: 0.8879 - val_loss: 0.3833
Epoch 78/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.9083 - loss: 0.3184 - val_accuracy: 0.8879 - val_loss: 0.3823
Epoch 79/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 51ms/step - accuracy: 0.9083 - loss: 0.3165 - val_accuracy: 0.8879 - val_loss: 0.3813
Epoch 80/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9083 - loss: 0.3147 - val_accuracy: 0.8910 - val_loss: 0.3803
Epoch 81/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9073 - loss: 0.3129 - val_accuracy: 0.8879 - val_loss: 0.3793
Epoch 82/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 54ms/step - accuracy: 0.9052 - loss: 0.3110 - val_accuracy: 0.8879 - val_loss: 0.3783
Epoch 83/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.9062 - loss: 0.3092 - val_accuracy: 0.8879 - val_loss: 0.3772
Epoch 84/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 75ms/step - accuracy: 0.9062 - loss: 0.3074 - val_accuracy: 0.8879 - val_loss: 0.3760
Epoch 85/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 119ms/step - accuracy: 0.9083 - loss: 0.3057 - val_accuracy: 0.8847 - val_loss: 0.3749
Epoch 86/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 52ms/step - accuracy: 0.9083 - loss: 0.3039 - val_accuracy: 0.8847 - val_loss: 0.3738
Epoch 87/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9083 - loss: 0.3022 - val_accuracy: 0.8847 - val_loss: 0.3727
Epoch 88/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 46ms/step - accuracy: 0.9073 - loss: 0.3004 - val_accuracy: 0.8847 - val_loss: 0.3715
Epoch 89/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9073 - loss: 0.2987 - val_accuracy: 0.8847 - val_loss: 0.3703
Epoch 90/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 47ms/step - accuracy: 0.9073 - loss: 0.2970 - val_accuracy: 0.8847 - val_loss: 0.3691
Epoch 91/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9073 - loss: 0.2954 - val_accuracy: 0.8847 - val_loss: 0.3679
Epoch 92/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 50ms/step - accuracy: 0.9073 - loss: 0.2937 - val_accuracy: 0.8816 - val_loss: 0.3668
Epoch 93/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9062 - loss: 0.2921 - val_accuracy: 0.8816 - val_loss: 0.3656
Epoch 94/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9052 - loss: 0.2905 - val_accuracy: 0.8816 - val_loss: 0.3644
Epoch 95/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 49ms/step - accuracy: 0.9052 - loss: 0.2889 - val_accuracy: 0.8816 - val_loss: 0.3633
Epoch 96/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 53ms/step - accuracy: 0.9052 - loss: 0.2873 - val_accuracy: 0.8816 - val_loss: 0.3623
Epoch 97/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 48ms/step - accuracy: 0.9062 - loss: 0.2857 - val_accuracy: 0.8816 - val_loss: 0.3613
Epoch 98/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 41ms/step - accuracy: 0.9052 - loss: 0.2842 - val_accuracy: 0.8816 - val_loss: 0.3603
Epoch 99/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 58ms/step - accuracy: 0.9062 - loss: 0.2826 - val_accuracy: 0.8816 - val_loss: 0.3594
Epoch 100/100
1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 45ms/step - accuracy: 0.9073 - loss: 0.2811 - val_accuracy: 0.8816 - val_loss: 0.3586
In [ ]:
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
Out[ ]:
[<matplotlib.lines.Line2D at 0x7965d10c3340>]
No description has been provided for this image
In [ ]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
Out[ ]:
[<matplotlib.lines.Line2D at 0x7965d1a92e30>]
No description has been provided for this image
In [ ]:
model.evaluate(X_test_scaled,y_test)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.8901 - loss: 0.3365 
Out[ ]:
[0.35858526825904846, 0.881619930267334]
In [ ]:
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(14,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(7,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(3,activation='relu'))
model.add(Dropout(0.1))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss = 'binary_crossentropy', optimizer='adam', metrics=['accuracy'])
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
In [ ]:
early_stop = EarlyStopping(monitor = 'val_loss',mode='min',verbose=1,patience=25)
In [ ]:
model.fit(x = X_train_scaled,y = y_train,batch_size=2,validation_data=(X_test_scaled,y_test),epochs=600,callbacks=[early_stop])
Epoch 1/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.5999 - loss: 0.7037 - val_accuracy: 0.8629 - val_loss: 0.5071
Epoch 2/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8463 - loss: 0.4758 - val_accuracy: 0.8941 - val_loss: 0.3558
Epoch 3/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8610 - loss: 0.3600 - val_accuracy: 0.8941 - val_loss: 0.3922
Epoch 4/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8592 - loss: 0.3032 - val_accuracy: 0.9159 - val_loss: 0.4287
Epoch 5/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8646 - loss: 0.2865 - val_accuracy: 0.8972 - val_loss: 0.3573
Epoch 6/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8874 - loss: 0.2465 - val_accuracy: 0.9128 - val_loss: 0.3446
Epoch 7/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8511 - loss: 0.2765 - val_accuracy: 0.9159 - val_loss: 0.3460
Epoch 8/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8737 - loss: 0.2588 - val_accuracy: 0.9190 - val_loss: 0.3268
Epoch 9/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8805 - loss: 0.2401 - val_accuracy: 0.9252 - val_loss: 0.3315
Epoch 10/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9126 - loss: 0.2021 - val_accuracy: 0.9283 - val_loss: 0.3231
Epoch 11/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8939 - loss: 0.2384 - val_accuracy: 0.9283 - val_loss: 0.3096
Epoch 12/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9131 - loss: 0.1935 - val_accuracy: 0.9315 - val_loss: 0.3261
Epoch 13/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9008 - loss: 0.2062 - val_accuracy: 0.9346 - val_loss: 0.3041
Epoch 14/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8996 - loss: 0.2066 - val_accuracy: 0.9283 - val_loss: 0.3168
Epoch 15/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9188 - loss: 0.2099 - val_accuracy: 0.9283 - val_loss: 0.3238
Epoch 16/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9219 - loss: 0.1771 - val_accuracy: 0.9315 - val_loss: 0.2851
Epoch 17/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9144 - loss: 0.1857 - val_accuracy: 0.9252 - val_loss: 0.2705
Epoch 18/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9481 - loss: 0.1627 - val_accuracy: 0.9408 - val_loss: 0.2776
Epoch 19/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9267 - loss: 0.1887 - val_accuracy: 0.9377 - val_loss: 0.3052
Epoch 20/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9472 - loss: 0.1389 - val_accuracy: 0.9439 - val_loss: 0.2388
Epoch 21/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9256 - loss: 0.1715 - val_accuracy: 0.9408 - val_loss: 0.1976
Epoch 22/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9435 - loss: 0.1519 - val_accuracy: 0.9377 - val_loss: 0.2005
Epoch 23/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9292 - loss: 0.1515 - val_accuracy: 0.9408 - val_loss: 0.2068
Epoch 24/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9555 - loss: 0.1073 - val_accuracy: 0.9439 - val_loss: 0.1894
Epoch 25/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9668 - loss: 0.1221 - val_accuracy: 0.9533 - val_loss: 0.2097
Epoch 26/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9578 - loss: 0.1224 - val_accuracy: 0.9408 - val_loss: 0.1786
Epoch 27/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9711 - loss: 0.1062 - val_accuracy: 0.9564 - val_loss: 0.1781
Epoch 28/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9709 - loss: 0.1005 - val_accuracy: 0.9377 - val_loss: 0.1857
Epoch 29/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9624 - loss: 0.1017 - val_accuracy: 0.9595 - val_loss: 0.1962
Epoch 30/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9713 - loss: 0.0832 - val_accuracy: 0.9626 - val_loss: 0.1755
Epoch 31/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9634 - loss: 0.1000 - val_accuracy: 0.9533 - val_loss: 0.1675
Epoch 32/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9658 - loss: 0.1134 - val_accuracy: 0.9626 - val_loss: 0.1135
Epoch 33/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9713 - loss: 0.0807 - val_accuracy: 0.9626 - val_loss: 0.1236
Epoch 34/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9736 - loss: 0.0991 - val_accuracy: 0.9688 - val_loss: 0.0994
Epoch 35/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9650 - loss: 0.0890 - val_accuracy: 0.9688 - val_loss: 0.0958
Epoch 36/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9701 - loss: 0.0918 - val_accuracy: 0.9595 - val_loss: 0.1252
Epoch 37/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9694 - loss: 0.1541 - val_accuracy: 0.9595 - val_loss: 0.1054
Epoch 38/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9759 - loss: 0.0865 - val_accuracy: 0.9626 - val_loss: 0.1012
Epoch 39/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9732 - loss: 0.0886 - val_accuracy: 0.9626 - val_loss: 0.1013
Epoch 40/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9826 - loss: 0.0719 - val_accuracy: 0.9657 - val_loss: 0.0985
Epoch 41/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9771 - loss: 0.0716 - val_accuracy: 0.9657 - val_loss: 0.1027
Epoch 42/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9758 - loss: 0.0709 - val_accuracy: 0.9626 - val_loss: 0.0946
Epoch 43/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9841 - loss: 0.0551 - val_accuracy: 0.9626 - val_loss: 0.1016
Epoch 44/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9828 - loss: 0.0740 - val_accuracy: 0.9595 - val_loss: 0.1850
Epoch 45/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9689 - loss: 0.0842 - val_accuracy: 0.9657 - val_loss: 0.1265
Epoch 46/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9751 - loss: 0.0869 - val_accuracy: 0.9657 - val_loss: 0.1114
Epoch 47/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9790 - loss: 0.0639 - val_accuracy: 0.9657 - val_loss: 0.1324
Epoch 48/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9796 - loss: 0.0783 - val_accuracy: 0.9657 - val_loss: 0.1028
Epoch 49/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9840 - loss: 0.0614 - val_accuracy: 0.9720 - val_loss: 0.0903
Epoch 50/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9791 - loss: 0.0792 - val_accuracy: 0.9720 - val_loss: 0.0905
Epoch 51/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9824 - loss: 0.0649 - val_accuracy: 0.9720 - val_loss: 0.0889
Epoch 52/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9770 - loss: 0.0684 - val_accuracy: 0.9720 - val_loss: 0.0880
Epoch 53/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9823 - loss: 0.0789 - val_accuracy: 0.9657 - val_loss: 0.1271
Epoch 54/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9789 - loss: 0.0647 - val_accuracy: 0.9657 - val_loss: 0.1772
Epoch 55/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9798 - loss: 0.0683 - val_accuracy: 0.9688 - val_loss: 0.1203
Epoch 56/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9890 - loss: 0.0439 - val_accuracy: 0.9688 - val_loss: 0.1985
Epoch 57/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9884 - loss: 0.0564 - val_accuracy: 0.9657 - val_loss: 0.0853
Epoch 58/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9850 - loss: 0.0577 - val_accuracy: 0.9688 - val_loss: 0.0905
Epoch 59/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9932 - loss: 0.0512 - val_accuracy: 0.9688 - val_loss: 0.0924
Epoch 60/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9790 - loss: 0.0668 - val_accuracy: 0.9720 - val_loss: 0.1118
Epoch 61/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9860 - loss: 0.0550 - val_accuracy: 0.9688 - val_loss: 0.1321
Epoch 62/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9831 - loss: 0.0583 - val_accuracy: 0.9657 - val_loss: 0.1553
Epoch 63/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9843 - loss: 0.0707 - val_accuracy: 0.9688 - val_loss: 0.0843
Epoch 64/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9698 - loss: 0.0783 - val_accuracy: 0.9688 - val_loss: 0.0974
Epoch 65/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9865 - loss: 0.0495 - val_accuracy: 0.9751 - val_loss: 0.0911
Epoch 66/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9809 - loss: 0.0686 - val_accuracy: 0.9720 - val_loss: 0.1043
Epoch 67/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9793 - loss: 0.0781 - val_accuracy: 0.9688 - val_loss: 0.1415
Epoch 68/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9828 - loss: 0.0571 - val_accuracy: 0.9688 - val_loss: 0.1699
Epoch 69/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9841 - loss: 0.0564 - val_accuracy: 0.9720 - val_loss: 0.1128
Epoch 70/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9924 - loss: 0.0394 - val_accuracy: 0.9720 - val_loss: 0.0981
Epoch 71/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9831 - loss: 0.0500 - val_accuracy: 0.9688 - val_loss: 0.1009
Epoch 72/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9871 - loss: 0.0457 - val_accuracy: 0.9720 - val_loss: 0.1099
Epoch 73/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9739 - loss: 0.0765 - val_accuracy: 0.9657 - val_loss: 0.1688
Epoch 74/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9868 - loss: 0.1122 - val_accuracy: 0.9688 - val_loss: 0.1504
Epoch 75/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9853 - loss: 0.0464 - val_accuracy: 0.9657 - val_loss: 0.1511
Epoch 76/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9786 - loss: 0.0698 - val_accuracy: 0.9688 - val_loss: 0.1556
Epoch 77/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9843 - loss: 0.0547 - val_accuracy: 0.9720 - val_loss: 0.1808
Epoch 78/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9907 - loss: 0.0304 - val_accuracy: 0.9720 - val_loss: 0.2295
Epoch 79/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9830 - loss: 0.0697 - val_accuracy: 0.9720 - val_loss: 0.1689
Epoch 80/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9905 - loss: 0.0355 - val_accuracy: 0.9657 - val_loss: 0.1695
Epoch 81/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9817 - loss: 0.0478 - val_accuracy: 0.9720 - val_loss: 0.2304
Epoch 82/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9961 - loss: 0.0264 - val_accuracy: 0.9720 - val_loss: 0.1329
Epoch 83/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9791 - loss: 0.0421 - val_accuracy: 0.9688 - val_loss: 0.1419
Epoch 84/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9842 - loss: 0.0492 - val_accuracy: 0.9751 - val_loss: 0.1054
Epoch 85/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9878 - loss: 0.0383 - val_accuracy: 0.9720 - val_loss: 0.1651
Epoch 86/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9916 - loss: 0.0585 - val_accuracy: 0.9688 - val_loss: 0.1481
Epoch 87/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9891 - loss: 0.0375 - val_accuracy: 0.9688 - val_loss: 0.2199
Epoch 88/600
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9847 - loss: 0.0471 - val_accuracy: 0.9657 - val_loss: 0.2256
Epoch 88: early stopping
Out[ ]:
<keras.src.callbacks.history.History at 0x7965cfbd8af0>
In [ ]:
pd.DataFrame(model.history.history).plot()
Out[ ]:
<AxesSubplot:>
No description has been provided for this image
In [ ]:
model.evaluate(X_test_scaled,y_test)
 1/11 ━━━━━━━━━━━━━━━━━━━━ 0s 33ms/step - accuracy: 1.0000 - loss: 0.001111/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step - accuracy: 0.9717 - loss: 0.2331 
Out[ ]:
[0.22563399374485016, 0.9657320976257324]
In [ ]:
pred = model.predict(X_test_scaled)
pred
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step 
Out[ ]:
array([[1.49767114e-08],
       [9.99996364e-01],
       [9.99989510e-01],
       [2.34794206e-05],
       [3.26990204e-07],
       [4.63475835e-16],
       [1.14782417e-09],
       [1.37853049e-05],
       [2.10904373e-38],
       [1.22355688e-17],
       [9.47022784e-07],
       [9.99997079e-01],
       [1.48535608e-19],
       [8.24223605e-08],
       [1.30682758e-12],
       [2.53171286e-11],
       [2.50650012e-09],
       [1.04336220e-18],
       [1.50063850e-09],
       [7.22405827e-03],
       [1.94349457e-02],
       [1.44995431e-12],
       [1.44852375e-14],
       [9.99998510e-01],
       [1.33410865e-03],
       [9.95549619e-01],
       [4.30452385e-09],
       [9.98654902e-01],
       [1.21342828e-05],
       [1.40959961e-11],
       [0.00000000e+00],
       [4.76045159e-07],
       [6.32333918e-04],
       [8.82855767e-34],
       [2.98582371e-38],
       [2.62632151e-04],
       [9.96635616e-01],
       [9.99975324e-01],
       [9.99907136e-01],
       [3.64588294e-03],
       [5.00209283e-21],
       [9.99977410e-01],
       [6.08151424e-07],
       [3.18250254e-25],
       [4.81809916e-16],
       [9.93347764e-01],
       [5.17030969e-08],
       [2.40078335e-12],
       [2.24314720e-31],
       [8.48743250e-04],
       [9.99987245e-01],
       [1.42118861e-23],
       [1.10301725e-22],
       [1.27917781e-04],
       [9.99976397e-01],
       [2.66162259e-10],
       [9.99967813e-01],
       [1.58057641e-03],
       [3.00077357e-21],
       [7.45777082e-26],
       [9.99976039e-01],
       [9.48741763e-06],
       [1.22351004e-04],
       [5.98280679e-17],
       [2.03145985e-02],
       [1.58556546e-08],
       [1.00000000e+00],
       [3.51858544e-05],
       [9.99987364e-01],
       [0.00000000e+00],
       [1.87447051e-06],
       [2.12559598e-13],
       [2.09591460e-23],
       [3.75203774e-13],
       [2.69616027e-08],
       [1.20476494e-02],
       [1.77243962e-08],
       [9.99995887e-01],
       [3.09584647e-09],
       [3.29492544e-03],
       [8.02365970e-03],
       [0.00000000e+00],
       [4.81129391e-03],
       [1.57202094e-15],
       [2.51497738e-08],
       [9.96932209e-01],
       [9.73208249e-01],
       [0.00000000e+00],
       [7.90176928e-05],
       [1.32435230e-12],
       [1.03168841e-03],
       [9.99996364e-01],
       [1.41271278e-16],
       [9.94130254e-01],
       [1.03362196e-03],
       [8.39420547e-13],
       [5.24250609e-05],
       [9.99992847e-01],
       [3.38503151e-13],
       [9.92281199e-01],
       [9.93868589e-01],
       [3.86682612e-19],
       [1.21180094e-16],
       [1.97507648e-22],
       [9.65301096e-01],
       [9.99989212e-01],
       [2.72299421e-37],
       [1.53624110e-11],
       [2.29853402e-15],
       [9.99976695e-01],
       [9.91829634e-01],
       [1.46108699e-23],
       [1.51821857e-15],
       [6.18548249e-14],
       [7.62617029e-03],
       [9.99977469e-01],
       [1.01631746e-15],
       [2.30706161e-13],
       [9.93021846e-01],
       [4.26051875e-29],
       [9.97312963e-01],
       [9.88815129e-01],
       [0.00000000e+00],
       [2.55372271e-25],
       [5.73282903e-13],
       [2.15930594e-15],
       [7.59117454e-02],
       [5.33174352e-17],
       [9.41204607e-01],
       [5.49186974e-10],
       [4.84825259e-20],
       [2.93265412e-09],
       [2.23284303e-12],
       [1.59339200e-32],
       [9.95716528e-06],
       [9.98423755e-01],
       [9.97991085e-01],
       [9.99936044e-01],
       [4.63728655e-09],
       [8.33506278e-13],
       [5.90753843e-05],
       [5.11248027e-06],
       [5.72733223e-01],
       [3.26046045e-32],
       [9.99992728e-01],
       [2.55879841e-21],
       [2.52899272e-17],
       [9.99996662e-01],
       [1.93697922e-02],
       [1.30361496e-02],
       [2.79380589e-30],
       [3.88123267e-09],
       [7.10944172e-34],
       [9.78153795e-02],
       [9.99908388e-01],
       [1.31809007e-04],
       [6.02606923e-11],
       [8.24615515e-11],
       [2.59907931e-01],
       [9.99790788e-01],
       [6.59610494e-04],
       [4.44521662e-04],
       [2.67849828e-06],
       [9.99981225e-01],
       [3.49268946e-03],
       [9.99951541e-01],
       [1.55403186e-12],
       [9.99987900e-01],
       [2.35916131e-09],
       [9.68408644e-01],
       [4.40323877e-14],
       [2.69645341e-02],
       [9.36884437e-09],
       [9.98465776e-01],
       [7.43292091e-18],
       [5.03324941e-07],
       [2.56846278e-08],
       [1.61483132e-12],
       [3.24631259e-07],
       [9.55076871e-07],
       [4.07488848e-13],
       [9.02817305e-07],
       [1.95152041e-13],
       [9.99916255e-01],
       [1.11427201e-09],
       [8.36632680e-03],
       [9.94852602e-01],
       [1.17478777e-14],
       [5.21913250e-07],
       [6.39253904e-18],
       [5.01019948e-10],
       [9.95687246e-01],
       [6.14100462e-03],
       [9.99992430e-01],
       [2.36836559e-06],
       [9.99943733e-01],
       [1.53158235e-05],
       [9.99977529e-01],
       [6.08034456e-11],
       [0.00000000e+00],
       [7.45354271e-07],
       [4.38922256e-31],
       [9.93912041e-01],
       [0.00000000e+00],
       [6.47000771e-20],
       [9.99987125e-01],
       [4.42098162e-24],
       [7.86738191e-03],
       [5.20298757e-26],
       [9.74497606e-31],
       [9.99940813e-01],
       [9.83121693e-01],
       [9.99517262e-01],
       [4.81714703e-11],
       [2.12542174e-04],
       [1.46269752e-17],
       [4.27205709e-14],
       [3.30934698e-22],
       [7.81060815e-01],
       [4.79755229e-11],
       [5.99481564e-06],
       [9.98857737e-01],
       [9.99987185e-01],
       [9.99947667e-01],
       [1.73099844e-15],
       [9.99998868e-01],
       [2.88991927e-04],
       [9.89477634e-01],
       [9.99995291e-01],
       [4.88308300e-16],
       [1.05363481e-23],
       [8.00057555e-30],
       [1.12310867e-03],
       [1.34212414e-05],
       [2.22730218e-04],
       [1.65804653e-04],
       [1.19396466e-17],
       [9.99995232e-01],
       [1.74745335e-04],
       [1.04058989e-14],
       [9.80576673e-22],
       [9.99936700e-01],
       [1.01002315e-05],
       [4.59906581e-13],
       [9.52170491e-01],
       [3.81837339e-07],
       [4.42785386e-04],
       [2.36423517e-07],
       [9.80862260e-01],
       [4.70835261e-07],
       [2.17549299e-24],
       [8.76087185e-08],
       [9.99983728e-01],
       [2.34075433e-05],
       [2.33462290e-03],
       [7.09961325e-24],
       [1.69196667e-07],
       [9.99958098e-01],
       [9.99980152e-01],
       [9.99996781e-01],
       [9.99981403e-01],
       [9.30789160e-04],
       [9.54221785e-01],
       [6.60807589e-11],
       [4.88410133e-26],
       [6.79721710e-19],
       [2.33659174e-16],
       [5.79205283e-04],
       [9.68092799e-01],
       [5.92151952e-11],
       [1.45777332e-17],
       [1.46390852e-02],
       [6.23501725e-22],
       [6.57781857e-06],
       [2.29034387e-03],
       [2.33601115e-17],
       [9.99997020e-01],
       [2.96037251e-05],
       [9.99976873e-01],
       [2.85196786e-20],
       [1.39045928e-18],
       [9.99954462e-01],
       [9.84932721e-01],
       [3.13463433e-09],
       [9.17021228e-17],
       [9.99952197e-01],
       [9.99993145e-01],
       [2.25755764e-04],
       [2.09895666e-06],
       [1.81972926e-12],
       [2.09203805e-04],
       [1.08529506e-15],
       [9.78815973e-01],
       [2.25965584e-22],
       [9.75841741e-10],
       [3.77770519e-30],
       [1.65859988e-15],
       [4.11712192e-03],
       [1.53636131e-06],
       [2.08774209e-02],
       [1.35833941e-15],
       [9.61844790e-08],
       [2.55301922e-34],
       [2.87332514e-04],
       [1.62998192e-27],
       [4.29834199e-06],
       [5.29569425e-02],
       [4.29428385e-19],
       [1.52198347e-28],
       [9.95288432e-01],
       [9.99964297e-01],
       [5.27848978e-13],
       [8.04141820e-09],
       [1.53164336e-07],
       [4.19612347e-17],
       [6.81271078e-04],
       [9.99977112e-01],
       [6.10742706e-11],
       [1.28324442e-02],
       [9.99957740e-01],
       [1.84369998e-15]], dtype=float32)
In [ ]:
pred = np.where(pred > 0.5, 1 ,0)
pred
Out[ ]:
array([[0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [1],
       [0],
       [0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [0],
       [1],
       [0]])
In [ ]:
print(classification_report(y_test,pred))
              precision    recall  f1-score   support

         0.0       0.98      0.97      0.98       237
         1.0       0.92      0.95      0.94        84

    accuracy                           0.97       321
   macro avg       0.95      0.96      0.96       321
weighted avg       0.97      0.97      0.97       321

In [ ]:
sns.heatmap(confusion_matrix(y_test,pred),annot=True,fmt='d')
Out[ ]:
<AxesSubplot:>
No description has been provided for this image

We've obtained a really strong model, even better than what we achieved with Support Vector Machines. Achieving an accuracy of around 97% on the test set is very satisfactory for us. We won't be testing any further with this classification of gradient networks; we'll now move on to Extreme Learning Machine (ELM), which is a type of Single Hidden Layer Feedforward Network (SLFN). Our ELM class is located in the file ELM.py, which we have already imported at the beginning. It's worth mentioning that during initialization, it takes three main input parameters: the number of features, the number of hidden neurons, and the number of classes.

In [ ]:
model2 = ELM(
    14,
    200,
    2
)
In [ ]:
y_train_onehot = to_categorical(y_train,2)
y_test_onehot = to_categorical(y_test,2)
model2.fit(X_train_scaled,y_train_onehot)
In [ ]:
train_pred = model2.pred(X_train_scaled)
test_pred = model2.pred(X_test_scaled)
model2.acc(train_pred,y_train),model2.acc(test_pred,y_test)
Out[ ]:
(0.959375, 0.9221183800623053)

As we can see, with just a small number of hidden neurons, we achieved a model that is 92% accurate on the test set, making it a strong competitor for SVM. And remember, we're not introducing any additional parameters here. We can now check what the confusion matrix looks like for our results.

In [ ]:
model2.cm(test_pred, y_test)
No description has been provided for this image
In [ ]:
print(classification_report(y_test, test_pred))
              precision    recall  f1-score   support

         0.0       0.93      0.95      0.94       237
         1.0       0.86      0.81      0.83        84

    accuracy                           0.92       321
   macro avg       0.90      0.88      0.89       321
weighted avg       0.91      0.92      0.92       321

It seems quite satisfying to me, although the classes themselves are not evenly balanced, our model generally makes mistakes evenly, both with healthy and sick individuals. Let's now try to conduct something akin to grid search, but without using cross-validation. Since our function doesn't come from the scikit-learn package, we'll write a short code that allows us to perform such grid search.

In [ ]:
accuracy_list = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'])

for j in range(50,400,10):
    for i in [7,8,9,10,11,12,13,14]:
        pca = PCA(n_components=i).fit(X_train_scaled)
        X_train_pca = pca.transform(X_train_scaled)
        X_test_pca  = pca.transform(X_test_scaled)
        model = ELM(
        i,
        j,
        9
        )
        model.fit(X_train_pca, y_train_onehot)
        test_pred = model.pred(X_test_pca)
        accuracy = model.acc(test_pred, y_test)
        new_row = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'],data=[[i,j,accuracy]])
        accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)

accuracy_list.loc[accuracy_list['Accuracy'] == accuracy_list['Accuracy'].max()]
C:\Users\pawel.drzyzga\AppData\Local\Temp\ipykernel_6588\3928860975.py:17: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
Out[ ]:
Number of features Number of hidden neurons Accuracy
133 12 210 0.931464
135 14 210 0.931464
219 10 320 0.931464
In [ ]:
model = ELM(10,250,9)
pca = PCA(n_components=10).fit(X_train_scaled)
X_train_pca = pca.transform(X_train_scaled)
X_test_pca  = pca.transform(X_test_scaled)
model.fit(X_train_pca,y_train_onehot)
pred = model.pred(X_test_pca)

plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y_test,pred), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Out[ ]:
Text(95.72222222222221, 0.5, 'Actuval Value')
No description has been provided for this image

Our search led us to find a model that generalizes effectively at a level of over 93%, which may not be as good as what we achieved with SVM, but it certainly can be promising. It's worth noting that the ELM itself is quite "rudimentary" in our case; if we were to expand it, add parameters, we could achieve even better results. Interestingly, the best result was obtained for a model with the parameter n_components = 10.

3. Multiclass classification¶

At the outset, of course, we need to get rid of the column that corresponds to our binary class division.

In [ ]:
data2 = data.drop(columns=['Healthy'])
data2
Out[ ]:
WBC LYMp NEUTp LYMn NEUTn RBC HGB HCT MCV MCH MCHC PLT PDW PCT Diagnosis
0 10.00 43.200 50.100 4.30000 5.00000 2.77 7.3 24.2000 87.7 26.3 30.1 189.0 12.500000 0.17000 Normocytic hypochromic anemia
1 10.00 42.400 52.300 4.20000 5.30000 2.84 7.3 25.0000 88.2 25.7 20.2 180.0 12.500000 0.16000 Normocytic hypochromic anemia
2 7.20 30.700 60.700 2.20000 4.40000 3.97 9.0 30.5000 77.0 22.6 29.5 148.0 14.300000 0.14000 Iron deficiency anemia
3 6.00 30.200 63.500 1.80000 3.80000 4.22 3.8 32.8000 77.9 23.2 29.8 143.0 11.300000 0.12000 Iron deficiency anemia
4 4.20 39.100 53.700 1.60000 2.30000 3.93 0.4 316.0000 80.6 23.9 29.7 236.0 12.800000 0.22000 Normocytic hypochromic anemia
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1276 4.40 25.845 77.511 1.88076 5.14094 4.86 13.5 46.1526 80.7 27.7 34.4 180.0 14.312512 0.26028 Healthy
1277 5.60 25.845 77.511 1.88076 5.14094 4.85 15.0 46.1526 91.7 31.0 33.8 215.0 14.312512 0.26028 Healthy
1278 9.20 25.845 77.511 1.88076 5.14094 4.47 13.1 46.1526 88.7 29.3 33.0 329.0 14.312512 0.26028 Healthy
1279 6.48 25.845 77.511 1.88076 5.14094 4.75 13.2 46.1526 86.7 27.9 32.1 174.0 14.312512 0.26028 Healthy
1280 8.80 25.845 77.511 1.88076 5.14094 4.95 15.2 46.1526 89.7 30.6 34.2 279.0 14.312512 0.26028 Healthy

1281 rows × 15 columns

Now we will encode our data, which is essential when it comes to entering the topic of multiclass classification. Additionally, we will display our labels, which can help us in later interpretation.

In [ ]:
data2['Diagnosis_cod'] = LabelEncoder().fit_transform(data2['Diagnosis'])
mapping_dict = {data2['Diagnosis'].unique()[i]: data2['Diagnosis_cod'].unique()[i] for i in range(len(data2['Diagnosis'].unique()))} 
X2 = data2.drop(columns=['Diagnosis','Diagnosis_cod'])
y2 = data2['Diagnosis_cod']
mapping_dict
Out[ ]:
{'Normocytic hypochromic anemia': 5,
 'Iron deficiency anemia': 1,
 'Other microcytic anemia': 7,
 'Leukemia': 2,
 'Healthy': 0,
 'Thrombocytopenia': 8,
 'Normocytic normochromic anemia': 6,
 'Leukemia with thrombocytopenia': 3,
 'Macrocytic anemia': 4}

Now we split the data and then apply one-hot encoding, which is necessary if we want to use our Extreme Learning Machine. We didn't do this earlier because we only had classes 0 and 1.

In [ ]:
X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2, random_state=42, stratify=y2)
In [ ]:
y2_train_onehot = to_categorical(y2_train, 9)
y2_test_onehot = to_categorical(y2_test, 9)
In [ ]:
scaler = StandardScaler().fit(X2_train)
X2_train_scaled = scaler.transform(X2_train)
X2_test_scaled = scaler.transform(X2_test)

We're starting by applying the Extreme Learning Machine, using random settings for now. Let's see what we come up with.

In [ ]:
model = ELM(
    14,
    175,
    9
)
In [ ]:
model.fit(X2_train_scaled, y2_train_onehot)
In [ ]:
test_pred = model.pred(X2_test_scaled)
In [ ]:
plt.figure(figsize=(10,5))
sns.heatmap(confusion_matrix(y2_test,test_pred), annot=True, fmt='d')
plt.xlabel('Predict')
plt.ylabel('Actuval Value')
Out[ ]:
Text(95.72222222222221, 0.5, 'Actuval Value')
No description has been provided for this image
In [ ]:
model.acc(test_pred, y2_test)
Out[ ]:
0.7570093457943925

To start, we're getting a model that has 75% accuracy on the training set. This isn't a bad result. We need to consider that we've entered into multiclass classification, which means not only do we have 9 classes instead of 2, but also that the imbalance in class sizes (as we observed at the beginning) may leave much to be desired. However, there's no need to despair; let's proceed to apply our grid search to try to find the best configuration for our Extreme Learning Machine.

In [ ]:
accuracy_list = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'])
for j in range(100,350,5):
    for i in [10,11,12,13,14]:
        pca = PCA(n_components=i).fit(X2_train_scaled)
        X2_train_pca = pca.transform(X2_train_scaled)
        X2_test_pca  = pca.transform(X2_test_scaled)
        model = ELM(
        i,
        j,
        9
        )
        model.fit(X2_train_pca, y2_train_onehot)
        test_pred = model.pred(X2_test_pca)
        accuracy = model.acc(test_pred, y2_test)
        new_row = pd.DataFrame(columns=['Number of features','Number of hidden neurons','Accuracy'],data=[[i,j,accuracy]])
        accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
C:\Users\pawel.drzyzga\AppData\Local\Temp\ipykernel_6588\2139844260.py:16: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  accuracy_list = pd.concat([accuracy_list,new_row],ignore_index=True)
In [ ]:
accuracy_list.loc[accuracy_list['Accuracy'] == accuracy_list['Accuracy'].max()]
Out[ ]:
Number of features Number of hidden neurons Accuracy
138 13 235 0.813084
In [ ]:
pca = PCA(n_components=12).fit(X2_train_scaled)
X2_train_pca = pca.transform(X2_train_scaled)
X2_test_pca  = pca.transform(X2_test_scaled)
model = ELM(
12,
220,
9
)
model.fit(X2_train_pca, y2_train_onehot)
test_pred = model.pred(X2_test_pca)
model.cm(test_pred, y2_test)
No description has been provided for this image

We receive a slightly improved model that accurately generalizes in over 81% of cases. This gives us a 6 percentage point improvement, which is definitely a good result. It's not the same as with SVM, but here we had much less to do in terms of parameter tuning, and the training time was also significantly shorter. Now let's move on to logistic regression and see how it performs. First, we'll try it with default settings, and then we'll explore some options for the regularization parameter.

In [ ]:
LogisticRegression().fit(X2_train_scaled,y2_train).score(X2_test_scaled,y2_test)
Out[ ]:
0.7320872274143302
In [ ]:
pipeline = make_pipeline(
    StandardScaler(),
    PCA(),
    LogisticRegression()
)

param_grid =[
    {
        'pca__n_components': [10, 11, 12, 13, 14],
        'logisticregression__C': [0.01, 0.1, 1, 10, 100, 1000, 5000, 10000, 50000],
        'logisticregression__max_iter': [1000] 
    }
]

gridsearch1 = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10.
  warnings.warn(
In [ ]:
gridsearch1.best_estimator_
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('logisticregression',
                 LogisticRegression(C=5000, max_iter=1000))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('logisticregression',
                 LogisticRegression(C=5000, max_iter=1000))])
StandardScaler()
PCA(n_components=14)
LogisticRegression(C=5000, max_iter=1000)
In [ ]:
gridsearch1.score(X2_test, y2_test)
Out[ ]:
0.8068535825545171
In [ ]:
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
Out[ ]:
<Axes: >
No description has been provided for this image

Initially, we get a model that predicts with an accuracy of 73%, which is the worst result we've obtained so far. However, using GridSearchCV, we attempt to find the best parameter tuning, and eventually, our accuracy increases to 80%. A 7 percentage point increase is quite substantial for such a result. It's nearly the same as with the Extreme Learning Machine, but I believe there's definitely more potential for improvement with ELM. Let's now move directly to SVM. We won't create a model with default parameters; instead, we'll start with grid search right away, then analyze it, and step by step try to find the best model.

In [ ]:
pipeline1 = make_pipeline(
    StandardScaler(),
    PCA(),
    SVC(probability=True)
)

param_grid =[
    {
        'pca__n_components': [10,11,12,14],
        "svc__kernel": ['rbf'], 
        'svc__C': [0.001, 0.01, 0.1, 1, 10, 50, 100,500,1000], 
        'svc__gamma': [0.0001,0.001, 0.01, 0.1, 1, 10, 50, 100]
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10.
  warnings.warn(
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=500, gamma=0.01))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=500, gamma=0.01))])
StandardScaler()
PCA(n_components=14)
SVC(C=500, gamma=0.01)
In [ ]:
gridsearch1.best_score_
Out[ ]:
0.8791666666666667
In [ ]:
gridsearch1.score(X2_test,y2_test)
Out[ ]:
0.8878504672897196
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
Out[ ]:
<Axes: >
No description has been provided for this image

At the very beginning, we already receive a quite strong model, definitely the best among those we've created. However, we notice that the accuracy values start to increase as we move towards the northeast direction. Let's try to steer in that direction and find an even better model.

In [ ]:
pipeline1 = make_pipeline(
    StandardScaler(),
    PCA(),
    SVC(probability=True)
)

param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(10,5000,15), 
        'svc__gamma': np.linspace(.0001,.1,15)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10.
  warnings.warn(
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=3930.714285714286, gamma=0.007235714285714286))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=3930.714285714286, gamma=0.007235714285714286))])
StandardScaler()
PCA(n_components=14)
SVC(C=3930.714285714286, gamma=0.007235714285714286)
In [ ]:
gridsearch1.score(X2_test,y2_test)
Out[ ]:
0.9003115264797508
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

plt.figure(figsize=(15,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
Out[ ]:
<Axes: >
No description has been provided for this image

Our intuition was not wrong; we've already gained a few percentage points on the test set. Just by looking at the heatmap, it's apparent that it might be getting difficult to further improve the model, but let's try expanding the range of parameter search a bit more. Perhaps we'll manage to unearth something.

In [ ]:
pipeline1 = make_pipeline(
    StandardScaler(),
    PCA(),
    SVC(probability=True)
)

param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(1500,50000,15), 
        'svc__gamma': np.linspace(.0001,.005,15)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10.
  warnings.warn(
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=50000.0, gamma=0.00115))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=50000.0, gamma=0.00115))])
StandardScaler()
PCA(n_components=14)
SVC(C=50000.0, gamma=0.00115)
In [ ]:
gridsearch1.score(X2_test,y2_test)
Out[ ]:
0.9034267912772586
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

plt.figure(figsize=(15,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
pipeline1 = make_pipeline(
    StandardScaler(),
    PCA(),
    SVC(probability=True)
)

param_grid =[
    {
        'pca__n_components': [14],
        "svc__kernel": ['rbf'], 
        'svc__C': np.linspace(20000,170000,15), 
        'svc__gamma': np.linspace(.0001,.00325,10)
    },
]
gridsearch1 = GridSearchCV(estimator=pipeline1, param_grid=param_grid, cv=10).fit(X2_train,y2_train)
gridsearch1.best_estimator_
c:\Users\pawel.drzyzga\AppData\Local\Programs\Python\Python310\lib\site-packages\sklearn\model_selection\_split.py:737: UserWarning: The least populated class in y has only 8 members, which is less than n_splits=10.
  warnings.warn(
Out[ ]:
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=127142.85714285713, gamma=0.0008))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('standardscaler', StandardScaler()),
                ('pca', PCA(n_components=14)),
                ('svc', SVC(C=127142.85714285713, gamma=0.0008))])
StandardScaler()
PCA(n_components=14)
SVC(C=127142.85714285713, gamma=0.0008)
In [ ]:
gridsearch1.score(X2_test,y2_test)
Out[ ]:
0.9096573208722741
In [ ]:
results = pd.DataFrame(gridsearch1.cv_results_)
filtered_results = results.loc[results['param_pca__n_components'] == 14]
filtered_results = filtered_results[['param_svc__C', 'param_svc__gamma', 'mean_test_score']]
In [ ]:
pivot_table = filtered_results.pivot(index="param_svc__gamma", columns="param_svc__C", values="mean_test_score")

plt.figure(figsize=(25,10))
sns.heatmap(pivot_table, annot=True, cmap="viridis", cbar_kws={'label': 'Mean Test Score'})
plt.title('Heatmapa Mean Test Score for different param_svc__C i param_svc__gamma')
plt.xlabel('param_svc__C')
plt.ylabel('param_svc__gamma')
plt.show()
No description has been provided for this image
In [ ]:
sns.heatmap(confusion_matrix(y2_test, gridsearch1.predict(X2_test)), annot=True)
Out[ ]:
<Axes: >
No description has been provided for this image

Finally, we find what is arguably the best model we can currently achieve using SVM, and if not the best, then certainly very satisfactory. An accuracy level of over 90% in multiclass classification is certainly optimistic. The obstacle to achieving a higher score may simply be that some classes are very small in size. We can try to use gradient neural networks again.

In [ ]:
model = Sequential()
model.add(layers.Flatten(input_shape=(14,1)))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(112, activation='relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(9, activation='softmax')) 
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) 
/home/pawel/.local/lib/python3.10/site-packages/keras/src/layers/reshaping/flatten.py:37: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(**kwargs)
In [ ]:
early_stop = EarlyStopping(monitor ='val_loss',mode='min',verbose=1,patience=25)
In [ ]:
model.fit(x = X2_train_scaled,y = y2_train_onehot,batch_size=2,validation_data=(X2_test_scaled,y2_test_onehot),epochs=100,callbacks=[early_stop])
Epoch 1/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - accuracy: 0.3647 - loss: 1.8143 - val_accuracy: 0.6417 - val_loss: 1.2572
Epoch 2/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.6200 - loss: 1.1091 - val_accuracy: 0.7134 - val_loss: 0.8487
Epoch 3/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7260 - loss: 0.8441 - val_accuracy: 0.7944 - val_loss: 0.7356
Epoch 4/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7751 - loss: 0.6654 - val_accuracy: 0.7664 - val_loss: 0.7381
Epoch 5/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.7586 - loss: 0.6412 - val_accuracy: 0.8287 - val_loss: 0.6162
Epoch 6/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8283 - loss: 0.4976 - val_accuracy: 0.8100 - val_loss: 0.5986
Epoch 7/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8258 - loss: 0.5001 - val_accuracy: 0.8318 - val_loss: 0.9454
Epoch 8/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8234 - loss: 0.5284 - val_accuracy: 0.8474 - val_loss: 0.7701
Epoch 9/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8571 - loss: 0.4271 - val_accuracy: 0.8474 - val_loss: 0.6140
Epoch 10/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8492 - loss: 0.4168 - val_accuracy: 0.8411 - val_loss: 0.7402
Epoch 11/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8679 - loss: 0.3667 - val_accuracy: 0.8536 - val_loss: 0.6468
Epoch 12/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8385 - loss: 0.3734 - val_accuracy: 0.8505 - val_loss: 0.6060
Epoch 13/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8952 - loss: 0.3370 - val_accuracy: 0.8723 - val_loss: 0.6300
Epoch 14/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.8853 - loss: 0.3218 - val_accuracy: 0.8847 - val_loss: 0.7179
Epoch 15/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8827 - loss: 0.3035 - val_accuracy: 0.8536 - val_loss: 0.6227
Epoch 16/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9007 - loss: 0.2962 - val_accuracy: 0.8910 - val_loss: 0.4081
Epoch 17/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9250 - loss: 0.2346 - val_accuracy: 0.8879 - val_loss: 0.5004
Epoch 18/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8982 - loss: 0.2848 - val_accuracy: 0.8692 - val_loss: 0.4343
Epoch 19/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9194 - loss: 0.2377 - val_accuracy: 0.8692 - val_loss: 0.5852
Epoch 20/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9460 - loss: 0.1755 - val_accuracy: 0.8816 - val_loss: 0.4335
Epoch 21/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9148 - loss: 0.2546 - val_accuracy: 0.8785 - val_loss: 0.6249
Epoch 22/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9201 - loss: 0.2323 - val_accuracy: 0.8785 - val_loss: 0.7322
Epoch 23/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9264 - loss: 0.2127 - val_accuracy: 0.8692 - val_loss: 0.6674
Epoch 24/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9313 - loss: 0.1978 - val_accuracy: 0.8754 - val_loss: 0.6482
Epoch 25/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9423 - loss: 0.1780 - val_accuracy: 0.8816 - val_loss: 0.6170
Epoch 26/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9283 - loss: 0.2024 - val_accuracy: 0.8847 - val_loss: 0.5863
Epoch 27/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9245 - loss: 0.2224 - val_accuracy: 0.8910 - val_loss: 0.4938
Epoch 28/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9307 - loss: 0.2254 - val_accuracy: 0.8972 - val_loss: 0.5071
Epoch 29/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9320 - loss: 0.1933 - val_accuracy: 0.8754 - val_loss: 0.4960
Epoch 30/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9339 - loss: 0.1872 - val_accuracy: 0.8910 - val_loss: 0.5179
Epoch 31/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9485 - loss: 0.1907 - val_accuracy: 0.8879 - val_loss: 0.5998
Epoch 32/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9368 - loss: 0.2432 - val_accuracy: 0.8941 - val_loss: 0.6715
Epoch 33/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9544 - loss: 0.1266 - val_accuracy: 0.8941 - val_loss: 0.6721
Epoch 34/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9293 - loss: 0.1896 - val_accuracy: 0.9097 - val_loss: 0.6837
Epoch 35/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9531 - loss: 0.1456 - val_accuracy: 0.8567 - val_loss: 0.8032
Epoch 36/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9419 - loss: 0.1533 - val_accuracy: 0.9034 - val_loss: 0.6545
Epoch 37/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9436 - loss: 0.1395 - val_accuracy: 0.8723 - val_loss: 0.7557
Epoch 38/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - accuracy: 0.9526 - loss: 0.1871 - val_accuracy: 0.8692 - val_loss: 0.8177
Epoch 39/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9475 - loss: 0.1882 - val_accuracy: 0.8972 - val_loss: 0.6315
Epoch 40/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9490 - loss: 0.1988 - val_accuracy: 0.8879 - val_loss: 0.6939
Epoch 41/100
480/480 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9470 - loss: 0.1623 - val_accuracy: 0.9097 - val_loss: 0.5777
Epoch 41: early stopping
Out[ ]:
<keras.src.callbacks.history.History at 0x7965242adc00>
In [ ]:
pd.DataFrame(model.history.history).plot()
Out[ ]:
<AxesSubplot:>
No description has been provided for this image
In [ ]:
model.evaluate(X2_test_scaled,y2_test_onehot)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9094 - loss: 0.4510 
Out[ ]:
[0.5776535272598267, 0.9096572995185852]
In [ ]:
pred = model.predict(X2_test_scaled)
pred = np.argmax(pred,axis=1)
11/11 ━━━━━━━━━━━━━━━━━━━━ 0s 2ms/step 
In [ ]:
sns.heatmap(confusion_matrix(y2_test,pred),annot=True)
Out[ ]:
<AxesSubplot:>
No description has been provided for this image

To very short conclude, we received an model with a similarly accuracy as the best SVM model.

4. Concluding Remarks on the Entire Project¶

Project Objective¶

The aim of the project was to classify medical data related to blood parameters to accurately diagnose patients' health conditions. Various machine learning algorithms were applied to determine which one best predicts outcomes based on input data.

Initial Data Analysis¶

The data included significant features such as platelet count (PLT), hemoglobin concentration (HGB), and other blood parameters. Exploratory data analysis (EDA) revealed that some features have strong correlations with medical diagnoses.

Data Transformation¶

  • Outcome variables were binary encoded, facilitating modeling.
  • Data standardization and principal component analysis (PCA) were applied to reduce dimensionality and information redundancy.

Modeling¶

  • Logistic regression and support vector machine (SVM) with different kernels (RBF and linear) were used for classification, along with a simple neural network from the TensorFlow package, and the Extreme Learning Machine.
  • Data were split into training and test sets to assess overall model performance.

Results¶

  • Logistic regression achieved an accuracy of 87% on the test set, indicating solid, though not ideal, performance.
  • SVM with RBF kernel achieved the highest accuracy at 95%, suggesting it's the most effective model in this case.
  • Gradient-based neural network also showed fairly high accuracy, very close to SVM with RBF kernel in multiclass, but slightly higher than SVM in binary classification.
  • Extreme Learning Machine was the biggest surprise of the project, with its surprisingly high accuracy given its simplicity.

Visualization of Results¶

  • Confusion matrices and PCA plots helped visualize how well models perform in classification and which classes are most frequently misclassified.
  • ROC curves and AUC values showed that models distinguish between classes well, with SVM with RBF kernel having the highest AUC value.

Key Insights¶

  1. Model Selection: Gradient-based neural network proved to be the best model for classifying medical data, achieving the highest accuracy and best results in other evaluation metrics.
  2. Feature Importance: Features such as platelet count (PLT) and hemoglobin concentration (HGB) play a crucial role in diagnosing health conditions, as confirmed by high correlation coefficients and importance in models.
  3. Data Transformation: Data standardization and dimensionality reduction through PCA improved model performance, suggesting these steps are important in processing medical data.
  4. Application of Machine Learning: Machine learning models, such as SVM or Gradient-based neural network, are effective in classifying medical data and can be applied in clinical practice to assist with diagnoses.

In conclusion, the project demonstrated that appropriately chosen and processed machine learning algorithms can significantly improve health condition diagnoses based on medical data.